Recovering the number of clusters in data sets with noise features using feature rescaling factors
نویسندگان
چکیده
In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters. We experiment with the Silhouette (using squared Euclidean, Manhattan, and the p power of the Minkowski distance), Dunn’s, Calinski-Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.
منابع مشابه
Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملMinkowski Metric , Feature Weighting and Anomalous Cluster Initializing in K - Means Clustering Renato
This paper represents another step in overcoming a drawback of K-Means, its lack of defense against noisy features, by using feature weights in the criterion. The Weighted K-Means method by Huang et al. is extended to the corresponding Minkowski metric for measuring distances. Under Minkowski metric the feature weights become intuitively appealing feature rescaling factors in a conventional K-M...
متن کاملMinkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering
This paper represents another step in overcoming a drawback of K-Means, its lack of defense against noisy features, using feature weights in the criterion. The Weighted K-Means method by Huang et al. (2008, 2004, 2005) [5–7] is extended to the corresponding Minkowski metric for measuring distances. Under Minkowski metric the feature weights become intuitively appealing feature rescaling factors...
متن کاملClassification of ECG signals using Hermite functions and MLP neural networks
Classification of heart arrhythmia is an important step in developing devices for monitoring the health of individuals. This paper proposes a three module system for classification of electrocardiogram (ECG) beats. These modules are: denoising module, feature extraction module and a classification module. In the first module the stationary wavelet transform (SWF) is used for noise reduction of ...
متن کاملFault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods
Anti-Friction Bearing (AFB) is a very important machine component and its unscheduled failure leads to cause of malfunction in wide range of rotating machinery which results in unexpected downtime and economic loss. In this paper, ensemble machine learning techniques are demonstrated for the detection of different AFB faults. Initially, statistical features were extracted from temporal vibratio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Sci.
دوره 324 شماره
صفحات -
تاریخ انتشار 2015